Random graphs with clustering 
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We offer a solution to a long-standing problem in the physics of networks, the creation of a 
plausible, solvable model of a network that displays clustering or transitivity — the propensity for 
two neighbors of a network node also to be neighbors of one another. We show how standard 
random graph models can be generalized to incorporate clustering and give exact solutions for 
various properties of the resulting networks, including sizes of network components, size of the giant 
component if there is one, position of the phase transition at which the giant component forms, and 
position of the phase transition for percolation on the network. 
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Many networks, perhaps most, show clustering or tran- 
sitivity, the propensity for two neighbors of the same ver- 
tex also to be neighbors of one another, forming a triangle 
of connections in the network [H, 0, 0| • In a social network 
of friendships between individuals, for example, there is 
a high probability that two friends of a given individual 
will also be friends of one another. The network aver- 
age of this probability is called the clustering coefficient 
for the network. Measured clustering coefficients for so- 
cial networks are typically on the order of tens of percent 
and similar values are seen in many nonsocial networks as 
well, including technological and biological networks 

Although clustering in networks has been known and 
discussed for many years, it has proved difficult to model 
mathematically. Our continued inability to create a plau- 
sible analytic model of clustered networks has been a 
substantial impediment to the development of a com- 
prehensive theory of networked systems and contrasts 
sharply with our successes in the modeling of other net- 
work properties such as degree distributions Q and 
correlations 0, Q. A few network models, such as the 
small- world model of Watts and Strogatz 0], do show 
clustering and are at least approximately solvable, but 
are also rather specialized and not suitable as models of 
most real networks. A large number of computational 
models of clustered networks have been proposed that 
are more general in scope, almost all based on some form 
of "triadic closure" process in which one searches an ini- 
tially unclustered network for pairs of vertices with a 
common iieighbor and then connects them to form tri- 
angles llOl . U, 12, 13] ■ Unfortunately, because of the 
nature of these models, the calculation of their properties 
is limited to numerical approaches [l4| . 

An ideal solution to these problems would be to gen- 
eralize the standard random graphs that form the foun- 
dation for much of modern network theory to create an 
ensemble model of clustered networks for which one could 
calculate ensemble average properties exactly. It has long 
been felt, however, that such an approach is likely to be 
unworkable because our ability to calculate the proper- 
ties of random graphs rests on the fact that they are 
"locally tree-like," i.e., that they contain no short loops 



in their structure. The triangles of clustered networks 
violate this condition and hence one would expect their 
introduction into a random graph model to render the 
model intractable. 

But in this paper we show that this is not the case. 
We show that it is in fact possible to generalize ran- 
dom graphs to incorporate clustering in a simple, sensible 
fashion and to derive exact formulas for a wide variety of 
properties of the resulting networks. 

The model we propose generalizes the standard "con- 
figuration model" of network theory, which is a model of a 
random graph with arbitrary degree distribution Il5| . 
In that model one specifies the number of edges con- 
nected to each vertex. In our generalized model, pic- 
tured in Fig. [l] we specify both the number of edges 
and the number of triangles. For a network of n ver- 
tices, we define ti to be the number of triangles in which 
vertex i participates and Si to be the number of single 
edges other than those belonging to the triangles. That 
is, edges within triangles in this model are enumerated 
separately from edges that are placed singly [ll| . We can 
think of a single edge as being a network element that 
joins together two vertices and a triangle as a different 
kind of element that joins three. In principle, one could 
generalize the model further to include higher-order ele- 
ments of four or more vertices. The techniques described 
here can be extended in a straightforward fashion to such 
cases. 




FIG. 1: In the model proposed here, we separately specify 
the number of single edges and complete triangles (shaded) 
attached to each vertex. 
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We can think of Si as specifying the number of ends 
or "stubs" of single edges that emerge from vertex i and 
ti as specifying the number of corners of triangles. The 
complete joint degree sequence {s^, ti} specifies the num- 
bers of such stubs and corners for every vertex. In simple 
cases the values of Si and ti may be uncorrelated, but 
correlated choices are also possible that allow us to re- 
produce more complex behaviors seen in some networks, 
such as variation of local clustering with degree [l3| . 

Given the degree sequence, we create our network by 
choosing pairs of stubs uniformly at random and joining 
them to make complete edges, and also choosing trios of 
corners at random and joining them to form complete 
triangles. The end result is a network drawn uniformly 
at random from the set of all possible matchings of stubs 
and corners. The only constraint is that, in order that 
there be no stubs or corners left over at the end of the 
process, the total number of stubs must be a multiple 
of 2 and the total number of corners a multiple of 3. 

We define the joint degree distribution pst of our net- 
work to be the fraction of vertices connected to s single 
edges and t triangles — a quantity that can be easily mea- 
sured for any observed network. Given this joint distribu- 
tion, the conventional degree distribution of the network, 
the probability pk that a vertex has k edges in total, both 
singly and in triangles, is 



Pk 



s+2t, 



(1) 



s,t=0 



since each triangle connected to a vertex contributes 2 to 
the degree and each single edge contributes 1. (Here 6ij 
is the Kronecker delta.) 

As with other random graph models, calculations for 
the model presented in this paper make use of probability 
generating functions. The generating function for the 
joint degree distribution of our network is a function of 
two variables thus: 



^ Pst x^y*. 

s,t=0 



(2) 



We can also write down a generating function for the 
total degree distribution pk thus: 



fi^) = ^Pkz'' = X! X! PstSk,s+2tz'' ^ ^ P. 
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We can use these generating functions to calculate, for 
instance, the clustering coefficient C of the network. The 
clustering coefficient can be defined as 0] 

3 X (number of triangles in network) 3iV/\ 
C = -, ; - : , , . , ^ — , (4) 



where a connected triple means a single vertex connected 
by edges to two others. For the present model we have 



3N/S = n 



Pk = on 



(5) 
(6) 



and substituting into ^ then gives us the value of the 
clustering coefficient. Note that the factors of n cancel 
out in the substitution, giving a value of C that remains 
nonzero in the limit ri ^ oo so that the network always 
has clustering, by contrast with the configuration model 
and similar random graphs for which C — > 0. 

A further quantity that will be important in the fol- 
lowing calculations is the so-called excess degree distri- 
bution In the current model there are actually two 
different excess degree distributions: 



qst 



{s + l)ps+l,t 



'^^^ = {F) ' 



where (s) and {t) are the averages of s and t over all ver- 
tices. Here qst is the distribution of the number of edges 
and triangles attached to a vertex reached by traversing 
an edge, excluding the traversed edge, and rst is the cor- 
responding distribution for a vertex reached by traversing 
a triangle. The generating functions for these distribu- 
tions are 



1 



59(2;, y) = V qstx'^y* V spstx" ^y* = — 
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(8) 



St 
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(number of connected triples) 



(9) 

One of the definitive features of any network is its giant 
component — the portion of the network that is connected 
into a single extensive group such that any vertex in the 
group can be reached from any other via the network. In 
a communication network, for example, the giant com- 
ponent corresponds to the fraction of vertices that can 
actually intercommunicate, the rest being isolated in dis- 
connected small components. We can use our generating 
functions to calculate the size of the giant component in 
the clustered network. 

Let u be the mean probability that a vertex reached 
by traversing a single edge is not a member of the gi- 
ant component and v be the corresponding probability 
for a vertex reached by traversing a triangle. (Equiva- 
lently, is the probability that a triangle doesn't lead 
to the giant component via either of the vertices at its 
other corners.) In order for a vertex at the end of a sin- 
gle edge not to belong to the giant component, all the 
other vertices to which it is connected, either by edges 
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or by triangles, must also not be members of the giant 
component. If it is connected to s other edges and t tri- 
angles, then this happens with probability u^'v'^*. The 
generalized degrees s and t are distributed according to 
the excess degree distribution q^t and, averaging over this 
distribution, we find 



(10) 



By a similar argument we also find that 

v = gr{u,v^). (11) 

Then the probability that a randomly chosen vertex is 
not in the giant component is X^stPsiW^u^* = gp{u,v'^) 
and the expected size S of the giant component as a 
fraction of the entire network is one minus this quantity: 



(12) 



Between them, Eqs. (|10|) -p2 |) allow us to calculate the 
size of the giant component if there is one. 

As an example, consider a network that has the doubly 
Poisson degree distribution 



p.-e-'^^e-^- 

" si tr 



(13) 



where the parameters and v are the average numbers of 
single edges and triangles per vertex respectively. Then 

gp{x,y) ^ g,{x,y) = gr{x,y) = e^i-^^e'''-y-'\ (14) 

and u = V = 1 — S , leading to 

5 = l_e-[''S+-S(2-s)]^ (15) 

This is a transcendental equation that has no closed-form 
solution (other than the trivial solution 5 = 0) but it 
can easily be solved by numerical iteration starting from 
a suitable initial value. The right-hand panel of Fig. [2] 
shows the resulting giant component size as a function 
of clustering coefficient for a network with fixed average 
degree. As the figure shows, the size of the giant compo- 
nent falls off with increasing clustering coefficient, which 
happens because the triangles that give the network its 
clustering contain redundant edges that serve no pur- 
pose in connecting the giant component together. One 
edge out of every three in a triangle is redundant in this 
way. Thus for a given average degree, and hence a given 
total number of edges, fewer vertices can be connected 
together in a network of triangles than in a network of 
single edges. 

We can also calculate the sizes of the small components 
in the network. Let hq{z) be the generating function for 
the distribution of number of vertices accessible, either 
directly or indirectly, via the vertex at the end of a single 
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FIG. 2: Right panel: the size of the giant component in 
networks with the degree distribution of Eq. (|13|l and average 
degree ^ + 2i/ = 2, as a function of clustering coefficient. Left 
panel: the size of the giant cluster for percolation on the same 
networks for values of the clustering coefficient C = 0, 0.1, 0.2, 
and 0.3, as a function of bond occupation probability Note 
that when = 1 the giant cluster and giant component have 
the same size, as indicated by the dotted lines. 



edge, and similarly for hr{z) and triangles. Then, by an 
argument analogous to that of 6] , we can show that 

hq{z) = zgq{hq{z), h^z)), hr{z) = zgr{hq(z), hl{z)) , 

(16) 

and the probability that a randomly chosen vertex any- 
where in the network belongs to a component of a given 
size is generated by 



/ip(z) = zgp{hq{z),hl{z)). 



(17) 



Then, for example, the mean size of the component to 
which a vertex belongs is 

h'^il) = 1 + 4i'0)(l, + 25("'i)(l, l)K{l), (18) 

where gp™'"'* is gp differentiated m times with respect to 
its first argument and n times with respect to its second. 

The derivatives /iq(l) and h'^{l) in Eq. (flSjl can be 
found from Eq. (|16p by differentiating, setting x = y = 1, 
and making use Eqs. ^ and which gives 



1 

w 

1 



h'^il) = 1 + —H^.Kil) + —H^^Kil), (19) 



K{i) = 1 + ^i/,./i;(i) + ^i?,,/j;(i), (20) 

where the H variables are the elements of the Hessian 
matrix H of second derivatives of gp, evaluated at the 
point X = y = 1: 



-o'^'^Ul 1) 

xy Hp V^i 



(21) 



and so forth. 
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We can write Eqs. ([19)) and ([20)) in matrix form as 
h = 1 + • h, where the vectors h and 1 are h = 

and 1 = (1,1) and the diagonal matrices 

a. and (3 are 



a 





{t) 



/3 = 



1 
2 



(22) 



Rearranging yields (I — q:^^H/3) • h = 1, where I is the 
identity matrix, and by inverting this equation and com- 
bining the result with Eq. ([18)) we can find the average 
component size. 

The average will diverge at the point where det(I — 



Q;^ H/3) = and, performing the derivatives in Eq. ([2T)) . 
we find the following condition for the point at which the 
giant component forms: 



,M1 

'{s){ty 



(23) 



In the case where there are no triangles in the net- 
work, this equation reduces to the well known criterion 
(s2)/(s) - 2 = of MoUoy and Reed [Ts^ for the phase 
transition in the ordinary configuration model. When 
triangles are present, Eq. ([23| gives the appropriate gen- 
eralization of that criterion. 

We can calculate many other properties of our net- 
works, including average path lengths and vertex con- 
nection probabilities. As our final example in this pa- 
per we demonstrate the calculation of percolation prop- 
erties of random graphs with clustering. Both site 
and bond percolation processes on networks have im- 
portant applications: site percolation is related to net- 
work resilience [3, EH, while bond percolation is re- 
lated to the dynamics of disease and other spreading pro- 



cesses 



\M 21 



Consider, for instance, a bond percola- 
tion process on our model network, with each edge in the 
network occupied independently with probability cj). By 
analogy with our earlier calculations, let u be the prob- 
ability that a vertex is not connected to the percolating 
(giant) cluster of this percolation process via one of its 
single edges, and let be the corresponding probability 
for a triangle. 

If a vertex is not connected to the giant cluster via 
a given single edge then one of two things must be true: 
either the edge is not occupied, which happens with prob- 
ability 1 — or it is occupied but the vertex at its end 
is itself not connected to the giant cluster via any of its 
other edges or triangles of which, let us say, there are 
s and t respectively. This second process happens with 
probability (jiu^v^^. But s and t are by definition dis- 
tributed according to the excess degree distribution q^t 
and, averaging over this distribution, we then find that 

u = 1 - + ^ feM'f = 1 - 0[1 - gq{u, t;2)]. (24) 

St 

The corresponding equation for triangles is more in- 



volved, but still essentially straightforward to derive: 

= l~2^{l-cf')[l~gr{uy)]~4>^{i--2c^)[l-gl{u,v^)]. 

(25) 

(Notice that Eqs. ([24| and ([25]) reduce to Eqs. ([TO)) 
and (|lip for the giant component of the network, as they 
should, when (f) = 1.) 

Now the size S of the giant cluster of the percolation 
process is given by 5* = 1 — (7p(u,w^). The left-hand 
panel of Fig. [2] shows S" as a function of (f) for the Poisson 
network of Eq. ([TO)) , for fixed average degree and several 
different values of the clustering coefficient. As the figure 
shows, higher clustering pushes the percolation transition 
toward lower values of cf), which can be understood as an 
effect of the redundant paths introduced by the triangles 
in the network, which provide more opportunities to con- 
nect clusters together. At the same time, the ultimate 
size of the giant cluster as (j) approaches 1 is smaller in 
more clustered networks and indeed becomes equal to the 
size of the giant component when ^ = 1, as indicated by 
the dashed lines in the figure. Other properties of the 
percolation process can be calculated in a similar fash- 
ion, including the position of the percolation threshold, 
the mean size of small clusters, and the complete distri- 
bution of sizes of small clusters. 

To conclude, we have proposed a random-graph model 
of a clustered network that is exactly solvable for many 
of its properties including component sizes, existence and 
size of a giant component, and percolation properties. 
The model answers a long-standing question in the study 
of networks by showing how to construct an unbiased en- 
semble of networks with clustering, and could form the 
basis for future investigations of the effects of clustering 
on many processes of interest, including epidemic pro- 
cesses, network resilience, and dynamical systems on net- 
works. 
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